Introduction

Regression analysis is a simple yet powerful method to find the relations within a dataset. In this post, we will look at the insurance charges data obtained from Kaggle (https://www.kaggle.com/mirichoi0218/insurance/home). This data set consists of 7 columns: age, sex, bmi, children, smoker, region and charges. We will get into more details about these variables later.

The key questions that we would be asking are:

  1. Is there a relationship between medical charges and other variables in the dataset?
  2. How strong is the relationship between the medical charges and other variables?
  3. Which variables have a strong relation to medical charges?
  4. How accurately can we estimate the effect of each variable on medical charges?
  5. How accurately can we predict future medical charges?
  6. Is the relationship linear?
  7. Is there synergy amont the predictors?

We start with importing the required libraries:

library(magrittr)
library(purrr)
## 
## Attaching package: 'purrr'
## The following object is masked from 'package:magrittr':
## 
##     set_names
library(MASS)
library(car)
## Warning: package 'car' was built under R version 3.4.4
## Loading required package: carData
## Warning: package 'carData' was built under R version 3.4.4
## 
## Attaching package: 'car'
## The following object is masked from 'package:purrr':
## 
##     some
library(broom)
## Warning: package 'broom' was built under R version 3.4.4
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
library(psych)
## Warning: package 'psych' was built under R version 3.4.4
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
## The following object is masked from 'package:car':
## 
##     logit
library(caret)
## Warning: package 'caret' was built under R version 3.4.4
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
library(tidyr)
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:magrittr':
## 
##     extract

We import the data from the csv. We can see an overview of the data using summary() function.

insurance <- read.csv('insurance.csv')
summary(insurance)
##       age            sex           bmi           children     smoker    
##  Min.   :18.00   female:662   Min.   :15.96   Min.   :0.000   no :1064  
##  1st Qu.:27.00   male  :676   1st Qu.:26.30   1st Qu.:0.000   yes: 274  
##  Median :39.00                Median :30.40   Median :1.000             
##  Mean   :39.21                Mean   :30.66   Mean   :1.095             
##  3rd Qu.:51.00                3rd Qu.:34.69   3rd Qu.:2.000             
##  Max.   :64.00                Max.   :53.13   Max.   :5.000             
##        region       charges     
##  northeast:324   Min.   : 1122  
##  northwest:325   1st Qu.: 4740  
##  southeast:364   Median : 9382  
##  southwest:325   Mean   :13270  
##                  3rd Qu.:16640  
##                  Max.   :63770

The key points that can be taken from the summary are:

  1. The age of participants varies from 18 to 64.
  2. Around 49.48% of participants are female.
  3. The bmi of participants ranges from 15.96 to 53.13.
  4. Only 20.48% of the participants are smokers.
#insurance$age <- scale(insurance$age)
#insurance$bmi <- scale(insurance$bmi)
#insurance$children <- scale(insurance$children)

Is there a relationship between the medical charges and the predictors?

Linear regression follows the formula :

y = beta+ .

The coefficients in this linear equation denote the magnitude of additive relation between the predictor and the response.

As such, the null hypothesis would be that there is no relation between any of the predictors and the response, which would be possible when all the coefficients for the predictors are 0. The alternate hypothesis would be that atleast one of the predictors has a relation with the outcome, that is the coefficient of one of the predictors is non-zero.

This hypothesis is tested by computing the F-statistic. in case of no relationship between the predictor and the response, F-statistic will be closer to 1. On the contrary, if the alternate hypothesis is true, the F-statistic will be greater than 1. The p-value of F-statistic can be calculated using the number of records (n) and the number of predictors, and can then be used to determined whether the null hypothesis can be rejected or not.

We will start with fitting a multiple linear regression model using all the predictors:

lm.fit <- lm(formula = charges~., data = insurance)
#charges~. is the formula being used for linear regression. Here '.' means all the predictors in the dataset.
summary(lm.fit)
## 
## Call:
## lm(formula = charges ~ ., data = insurance)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11304.9  -2848.1   -982.1   1393.9  29992.8 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -11938.5      987.8 -12.086  < 2e-16 ***
## age                256.9       11.9  21.587  < 2e-16 ***
## sexmale           -131.3      332.9  -0.394 0.693348    
## bmi                339.2       28.6  11.860  < 2e-16 ***
## children           475.5      137.8   3.451 0.000577 ***
## smokeryes        23848.5      413.1  57.723  < 2e-16 ***
## regionnorthwest   -353.0      476.3  -0.741 0.458769    
## regionsoutheast  -1035.0      478.7  -2.162 0.030782 *  
## regionsouthwest   -960.0      477.9  -2.009 0.044765 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6062 on 1329 degrees of freedom
## Multiple R-squared:  0.7509, Adjusted R-squared:  0.7494 
## F-statistic: 500.8 on 8 and 1329 DF,  p-value: < 2.2e-16

A high value of F-statistic, with a significant p-value(<2.2e-16), implies that the null hypothesis can be rejected. This means there is a potential relationship between the predictors and the outcome.

RSE (Residual Standard Error) is the estimate of standard deviation of irreducible error. I simpler words, it is the average difference between the actual outcome and the outcome from the fitted regression line. Hence, a large value of RSE means a high deviation from the true regression line. As such, RSE is useful in determining the lack of fit of the model to the data. RSE in our model is large (6062), indicating that the model doeswn’t fit the data well.

R-squared measures the proportion of variability in Y that can be explained by X, and is always between 0 and 1. A high value of R-squared (0.7494) shows that around 75% of variance of the data is being explained by the model.

Which variables have a strong relation to medical charges?

If we look at the p-values of the estimated coefficients above, we see that not all the coefficients are statistically significant. This means that only a subset of the predictors are related to the outcome. The question is which one.

We can look at the individual p-values for selecting the variables. This may not be a problem when the number of predictors(7) is quite small compared to the number of observations(1338). This method won’t, however, work when number of predictors is greater than the number of observations. In such cases, we would have to use the feature/variable selection methods, like forward selection, backward selection, or mixed selection. Before jumping on to feature selection using any of these methods, let us try regression using the features with significant p-values.

lm.fit.sel <- lm(charges~age+bmi+children+smoker+region, data = insurance)
summary(lm.fit.sel)
## 
## Call:
## lm(formula = charges ~ age + bmi + children + smoker + region, 
##     data = insurance)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11367.2  -2835.4   -979.7   1361.9  29935.5 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -11990.27     978.76 -12.250  < 2e-16 ***
## age                256.97      11.89  21.610  < 2e-16 ***
## bmi                338.66      28.56  11.858  < 2e-16 ***
## children           474.57     137.74   3.445 0.000588 ***
## smokeryes        23836.30     411.86  57.875  < 2e-16 ***
## regionnorthwest   -352.18     476.12  -0.740 0.459618    
## regionsoutheast  -1034.36     478.54  -2.162 0.030834 *  
## regionsouthwest   -959.37     477.78  -2.008 0.044846 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6060 on 1330 degrees of freedom
## Multiple R-squared:  0.7509, Adjusted R-squared:  0.7496 
## F-statistic: 572.7 on 7 and 1330 DF,  p-value: < 2.2e-16

We will compare this to mixed variable selection, which is a combination of forward selection and backward selection.

step.lm.fit <- stepAIC(lm.fit, direction = "both", trace = FALSE)
summary(step.lm.fit)
## 
## Call:
## lm(formula = charges ~ age + bmi + children + smoker + region, 
##     data = insurance)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11367.2  -2835.4   -979.7   1361.9  29935.5 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -11990.27     978.76 -12.250  < 2e-16 ***
## age                256.97      11.89  21.610  < 2e-16 ***
## bmi                338.66      28.56  11.858  < 2e-16 ***
## children           474.57     137.74   3.445 0.000588 ***
## smokeryes        23836.30     411.86  57.875  < 2e-16 ***
## regionnorthwest   -352.18     476.12  -0.740 0.459618    
## regionsoutheast  -1034.36     478.54  -2.162 0.030834 *  
## regionsouthwest   -959.37     477.78  -2.008 0.044846 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6060 on 1330 degrees of freedom
## Multiple R-squared:  0.7509, Adjusted R-squared:  0.7496 
## F-statistic: 572.7 on 7 and 1330 DF,  p-value: < 2.2e-16

The model given by stepwise selection is same as the model we got by selecting predictors with significant p-values; so the simple method of selecting the coefficients on the basis of p-values works in this case.

We can see that there is a very slight improvement in R-squared value of the model(0.7494 -> 0.7496), with a very slight deterioration in RSE. (6062 -> 6060)

Some key inferences to be taken from the model are:

  1. Charges increase with increase in age of the key beneficiary. For every 1 year increase in age of the key benificiary, keeping everything else fixed, charges increase by around $256.97.
  2. Similar relations can be seen for other predictors. Higher charges are expected with higher BMI or higher number of children/dependents or if the person is a smoker.

Is the relationship linear?

By applying linear regression, we are assuming that there is a linear relationship between the predictors and the outcome. If the underlying relationship is quite far from linear, then most of the inferences we have made so far are doubtful. This also means reduced accuracy of model.

The non-linearity of the model can be determined using residual plots. For multiple linear regression, we can plot the residuals versus fitted values. Presence of a pattern in the residual plots would imply a problem with the linear assumption of the model.

residualPlot(step.lm.fit, type = "pearson", id=TRUE)

The blue line is a smooth fit of quadratic regression of Residuals as response and the Fitted values as the regressor. The curve is quite close to a straight line, indicating that the underlying data approximately follows linearity. (That number 1301 and 578; we’ll get to that later)

We can further plot the residual plots of individual predictors and residuals to see if any of the predictors demonstrate non-linearity.

#residualPlots(step.lm.fit)

–We don’t see any non-linearity with respect to individual predictors either.

par(mfrow=c(2,2))
plot(step.lm.fit)

One of the methods of fixing the problem of non-linearity is introducing interaction between the predictors. Out of the predictors that we have, an interaction of bmi and smoker may have an effect on the charges. Let’s update the model and see if that makes a difference:

lm.fit1 <- update(step.lm.fit, ~ .+bmi*smoker)
lm.fit1 %>%
  summary()
## 
## Call:
## lm(formula = charges ~ age + bmi + children + smoker + region + 
##     bmi:smoker, data = insurance)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14655.4  -1918.9  -1313.4   -489.7  30333.1 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -2453.564    857.695  -2.861  0.00429 ** 
## age                264.042      9.522  27.729  < 2e-16 ***
## bmi                 22.615     25.620   0.883  0.37756    
## children           512.713    110.266   4.650 3.65e-06 ***
## smokeryes       -20309.092   1648.861 -12.317  < 2e-16 ***
## regionnorthwest   -581.704    381.215  -1.526  0.12727    
## regionsoutheast  -1207.011    383.109  -3.151  0.00167 ** 
## regionsouthwest  -1227.601    382.576  -3.209  0.00136 ** 
## bmi:smokeryes     1438.108     52.630  27.325  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4851 on 1329 degrees of freedom
## Multiple R-squared:  0.8405, Adjusted R-squared:  0.8395 
## F-statistic: 875.4 on 8 and 1329 DF,  p-value: < 2.2e-16
lm.fit1 %>%
  residualPlot(type = "pearson", id=TRUE)

par(mfrow=c(2,2))
lm.fit1 %>%
  plot()

#residualPlots(lm.fit1)

Looking at the plot for the residuals, we can see that the relation between fitted values and residuals is more linear now. Moreover, the adjusted R-squared is higher now (0.7496 -> 0.8395) and the F-statistic has improved too (572 -> 875.4). RSE has decreased too(6060 -> 4581).

Correlation of error terms

An important assumption of linear regression model is that the consecutive error terms are uncorrelated. The standard errors of the estimated regression coefficients are calculated on this basis. Hence, if the consecutive error terms are correlated, the standard errors of the estimated regression coefficients may be much larger.

We can check the auto-correlation of residuals using the Durbin-Watson test. The null hypothesis is that the residuals have no auto-correlation. The alternate hypothesis is that the the residuals have a statistically significant correlation:

set.seed(1)
# Test for Autocorrelated Errors
durbinWatsonTest(lm.fit1, max.lag = 5, reps=1000)
##  lag Autocorrelation D-W Statistic p-value
##    1    -0.036922612      2.071895   0.194
##    2    -0.030476397      2.058983   0.278
##    3    -0.011398321      2.020746   0.668
##    4     0.003598824      1.982755   0.788
##    5    -0.003464141      1.996622   0.950
##  Alternative hypothesis: rho[lag] != 0

Here we are checking for auto-correlation of residuals for 5 different lags. The p-value for none of the lags is less than 0.05. Hence, we cannot reject the null hypothesis.

res <- lm.fit1$residuals %>%
  tidy 
res$names <- as.numeric(res$names)
res%>%
  ggplot +
  geom_point(aes(x=names, y=x)) +
  labs(x='index', y='residuals')

Non-constant variance of error terms

Constant variance of residuals is another assumption of a linear regression model. The error terms may, for instance, change with the value of the response variable in case of non-constant variance of errors. One of the methods of identifying non-constant variance of errors is presence of a funnel shape in the residual plot. A more concrete way is an extension of the Breusch-Pagan Test, available in R as ncvTest() in the cars package. It assumes a null hypothesis of constant variance against the alternate hypothesis that the error variance changes with the level of the response or with a linear combination of predictors.

# Evaluate homoscedasticity
# non-constant error variance test
ncvTest(lm.fit1)
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 24.00927    Df = 1     p = 9.58731e-07
lmtest::bptest(lm.fit1)
## 
##  studentized Breusch-Pagan test
## 
## data:  lm.fit1
## BP = 10.088, df = 8, p-value = 0.2589
# plot studentized residuals vs. fitted values 
spreadLevelPlot(lm.fit1)

## 
## Suggested power transformation:  0.5149389

A very low p-value(~9.59e-07) means the null hypothesis can be rejected. In other words, there is a high chance that errors have a non-constant variance. From the graph, we can also see how the spread of absolute studentized residuals is varying with increased value of fitted values. One of the methods to fix this problem is transformation of the outcome variable.

yTransformer <- 0.78

trans.lm.fit <- update(lm.fit1, charges^yTransformer~.)
trans.lm.fit %>%
  summary
## 
## Call:
## lm(formula = charges^yTransformer ~ age + bmi + children + smoker + 
##     region + bmi:smoker, data = insurance)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1061.61  -185.71  -129.90   -44.52  2384.83 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -40.330     78.901  -0.511 0.609337    
## age                27.541      0.876  31.440  < 2e-16 ***
## bmi                 2.578      2.357   1.094 0.274258    
## children           58.143     10.144   5.732 1.23e-08 ***
## smokeryes       -1366.140    151.682  -9.007  < 2e-16 ***
## regionnorthwest   -59.747     35.069  -1.704 0.088670 .  
## regionsoutheast  -124.153     35.243  -3.523 0.000442 ***
## regionsouthwest  -121.979     35.194  -3.466 0.000545 ***
## bmi:smokeryes     114.430      4.841  23.635  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 446.3 on 1329 degrees of freedom
## Multiple R-squared:  0.836,  Adjusted R-squared:  0.835 
## F-statistic:   847 on 8 and 1329 DF,  p-value: < 2.2e-16
trans.lm.fit %>%
  residualPlot()

# Evaluate homoscedasticity
# non-constant error variance test
ncvTest(trans.lm.fit)
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 0.005511406    Df = 1     p = 0.9408203
# plot studentized residuals vs. fitted values 
spreadLevelPlot(trans.lm.fit)

## 
## Suggested power transformation:  0.8027873

A p-value of 0.94 implies here that we cannot reject the null hypothesis of constant variance of error terms. However, there is a slight decrease in both adjusted R-squared as well as F-statistic.

This can be fixed further by looking at relations between individual predictors and outcome.


Outliers

Outliers are the observations which in some way are quite different from the distribution of the data. With respect to a model, an outlier is an observation whose predicted outcome is much different from the actual value of the outcome.

Residual Plots can be used to identify outliers. To use a standard comparison of residuals, we can use studentized residuals. Usually, the observations with residuals above 3 are possible outliers.

#temp <- update(trans.lm.fit, ~.+age*smoker+bmi*smoker)
temp <- trans.lm.fit

insCopy <- insurance
insCopy$charges <- (insurance$charges)^yTransformer
insCopy$predicted <- predict(temp)
insCopy$residuals <- residuals(temp)

insCopy %>%
  ggplot(aes(x=charges, y=predicted)) +
  geom_point()

insCopy %>%
  ggplot(aes(x=charges, y=residuals)) +
  geom_point()

insCopy %>%
  ggplot(aes(x=predicted, y=residuals)) +
  geom_point()

Care should be taken to not simply remove the outliers on the basis of analysis. If an outlier is due to …

insurance %>%
  keep(is.numeric) %>%
  outlier(bad=5)

##          1          2          3          4          5          6 
##  3.6433888  3.1674026  3.9469491  3.9681693  1.4297121  1.8908201 
##          7          8          9         10         11         12 
##  0.8477352  3.0126636  0.9478199  5.3197817  2.3497919  5.3411657 
##         13         14         15         16         17         18 
##  3.1544114  4.8546525 10.7549074  2.9361032  1.0863000  3.1844687 
##         19         20         21         22         23         24 
##  5.1707591  6.6969767  3.9617472  0.9894015  3.9860270  5.1695420 
##         25         26         27         28         29         30 
##  1.0727200  4.8557585  5.8862232  2.0842255  5.6254718  6.7812874 
##         31         32         33         34         35         36 
##  8.1480817  3.4092906 13.2321635  4.2650673 13.3509816  5.1710705 
##         37         38         39         40         41         42 
##  5.1442065  4.0592557  6.2104265 10.8443224  2.3788908  2.6334541 
##         43         44         45         46         47         48 
##  2.3623613  0.9764010  1.7680791  3.1675947  5.6200843  2.4693758 
##         49         50         51         52         53         54 
##  4.5656999  5.4044929  4.3572848  3.0788991  1.3052442  5.9726733 
##         55         56         57         58         59         60 
##  2.8752698  8.7614166  2.4707966  8.0082893  3.5835225  2.6062221 
##         61         62         63         64         65         66 
##  3.1192559  7.8795474  5.8270844  1.2976043  4.7217989  3.0375987 
##         67         68         69         70         71         72 
##  4.7966133  0.7733294  2.2651952  4.8333783  2.8769952 11.3875998 
##         73         74         75         76         77         78 
##  3.8261869  2.1140908  1.2866290  3.0560292  0.8633888  3.8610591 
##         79         80         81         82         83         84 
##  5.5600616  1.4048088  0.8952471  3.0975881  8.2902004  9.5104252 
##         85         86         87         88         89         90 
##  6.0137226  3.0487588  7.9623552  2.9886573  1.5950314  2.8366008 
##         91         92         93         94         95         96 
##  5.6441014  2.2366684  5.4019619  1.7146104  9.3848063  2.8031816 
##         97         98         99        100        101        102 
##  3.7423789  4.0765149  6.5910041  4.6814286  1.2797538  2.0141919 
##        103        104        105        106        107        108 
##  4.5050181  5.8495836  0.6434536  2.6987907  2.3316487  1.8841130 
##        109        110        111        112        113        114 
##  1.7531496  9.8683034  1.6362869  2.0636598  1.2824125  3.8215941 
##        115        116        117        118        119        120 
##  3.5474268  4.7478183 12.2082067  1.3283169  1.9982178  2.1525143 
##        121        122        123        124        125        126 
##  2.3681535  4.0282608  2.8211779  4.9586539  3.3447379  1.9129756 
##        127        128        129        130        131        132 
##  3.6191784  3.3059890 10.0164249  1.6160877  3.6982212  5.8852503 
##        133        134        135        136        137        138 
##  2.4834978  3.4035925  2.8089208  2.5546002  3.7835724  2.9874185 
##        139        140        141        142        143        144 
##  4.1124051  3.7939917  4.8837895  1.4470207  1.9754777  1.5428341 
##        145        146        147        148        149        150 
##  3.7625853  5.8166723  7.4621945  2.4344624  2.4706383  2.3735659 
##        151        152        153        154        155        156 
##  1.3719574  1.6611045  4.6183239  2.9973021  0.9205099  3.7647948 
##        157        158        159        160        161        162 
##  3.0191587  4.2662753  6.7765036  3.4202051  2.0279842  9.8622510 
##        163        164        165        166        167        168 
##  3.6406055  1.2065020  1.2158847  6.4813307 14.6303441  3.4509658 
##        169        170        171        172        173        174 
##  2.4735540  6.8797384  6.9744296  1.7408136  8.0755348  1.1451624 
##        175        176        177        178        179        180 
##  2.6259668 10.9158341  1.0784765  1.5156274  1.1679632  3.0310321 
##        181        182        183        184        185        186 
##  3.1288016  5.5931929  6.9320392  1.6980358  1.0616009 11.5006390 
##        187        188        189        190        191        192 
##  1.8244316  3.3932881  0.5035362  1.6029921  3.6041342  1.5943666 
##        193        194        195        196        197        198 
##  2.4881662  2.2016950  4.0824750  3.1052205  1.4140601  1.1364970 
##        199        200        201        202        203        204 
##  6.3080607  5.9673617  3.2461091  0.8164503  4.7398087  7.5823331 
##        205        206        207        208        209        210 
##  3.2783244  0.9237383  3.7876442  1.5375826  4.0151484  3.8528263 
##        211        212        213        214        215        216 
##  2.5814515  6.1834936  2.1329785  0.7588599  1.0578135  2.2799825 
##        217        218        219        220        221        222 
##  2.6035551  3.0018002  1.2038367  5.7953586  0.9269977  2.2801371 
##        223        224        225        226        227        228 
##  3.2594505  8.5763613  2.4148224  4.1713246  3.8688974  5.7525077 
##        229        230        231        232        233        234 
##  0.3839463  1.8634350  1.3734227  4.8346993  6.6437035  2.6231921 
##        235        236        237        238        239        240 
##  1.7961845  3.0725104  3.3874634  3.4783599  3.5806613  2.2508880 
##        241        242        243        244        245        246 
##  8.7079938  2.1400240  4.7838398  3.6591056  5.4025542  2.6193865 
##        247        248        249        250        251        252 
##  4.7031858  3.4787934  4.2626332  0.8642175  7.7068250  9.1187522 
##        253        254        255        256        257        258 
##  6.9697461  3.7545653  6.5076805  4.7209368  7.8155523  4.1257882 
##        259        260        261        262        263        264 
##  7.7862177  7.9679080  3.9016171  2.8359963  5.2683240  9.5965214 
##        265        266        267        268        269        270 
##  4.5460995 11.5132448  3.6635037  4.5115132  0.5751948  1.4271602 
##        271        272        273        274        275        276 
##  2.5579078  6.3223241  2.2753471  1.1817449  2.1815786  1.5427083 
##        277        278        279        280        281        282 
##  5.0572511  3.2013219  2.7385667  3.3337913  0.9297860 11.5573141 
##        283        284        285        286        287        288 
##  0.8710642  1.5583753  2.0348484  1.0186729  9.7464946  4.7152015 
##        289        290        291        292        293        294 
##  8.7642013  4.6562715  2.1629128  1.2895539 13.7761901  2.5194832 
##        295        296        297        298        299        300 
##  4.2056324  4.2731726  3.5852666  1.7548478  8.2768092  0.7672942 
##        301        302        303        304        305        306 
##  3.0930216  6.2000350  3.3758844  1.9066911  3.6998279  1.8121602 
##        307        308        309        310        311        312 
##  2.2465537  1.1592686  3.3704811  1.1197716  2.2672594  3.5887547 
##        313        314        315        316        317        318 
##  8.1332256  2.5653854  6.3504783  2.2062885  1.8820963  2.4028485 
##        319        320        321        322        323        324 
##  1.4743079  2.2391255  1.0776485  8.2810138  5.4115393  5.5791923 
##        325        326        327        328        329        330 
##  1.8324850  0.8159224  2.1821988  6.6429412  9.1966316  3.1177990 
##        331        332        333        334        335        336 
##  9.1758387  2.8694282  3.5301037  2.6895401  1.3185402  4.5684239 
##        337        338        339        340        341        342 
##  4.1872482  3.2378939  5.7027492  0.7794315  2.9888388  3.8274202 
##        343        344        345        346        347        348 
##  3.6721136  4.8371377  9.7524919  3.0488390  2.2030993  0.8207464 
##        349        350        351        352        353        354 
##  0.4166913  3.0921948  4.4504148  2.4766723  1.6461561  1.6474045 
##        355        356        357        358        359        360 
##  5.1604585  2.3935389  8.1534435  3.0856134  6.9121088  5.1687505 
##        361        362        363        364        365        366 
##  1.2699220  0.5188823  5.1403275  2.1754695  4.1342676  0.7641314 
##        367        368        369        370        371        372 
##  4.0183238  1.6812878  1.0984445  5.2462495  6.4042830  4.8842239 
##        373        374        375        376        377        378 
##  0.5873150  6.4742112  3.3925845  2.9145235  4.1065312  9.9946176 
##        379        380        381        382        383        384 
##  5.5736833  3.1575802  5.9527008  7.3622234  2.3489488  6.1170293 
##        385        386        387        388        389        390 
##  2.8740700  3.8704053  4.7811459  3.9452291  3.2265040  4.0905956 
##        391        392        393        394        395        396 
##  7.0950847  4.9174467  0.7466020  0.8271182  1.1423723  4.5209618 
##        397        398        399        400        401        402 
##  3.3558780  2.9682239  4.7141826  5.5368693  4.7065175  8.9518259 
##        403        404        405        406        407        408 
##  4.2909539  3.2996465  3.8796409  3.1949983  2.1579409  1.9567741 
##        409        410        411        412        413        414 
##  5.1471125  0.6708779  6.8531377  3.7978100  6.5645546 13.0476625 
##        415        416        417        418        419        420 
##  4.0066368  1.2599462  2.4176932  2.8843482  5.0436366  5.5245749 
##        421        422        423        424        425        426 
##  9.9324764  9.4750108  4.9765548  2.1127936  1.2930363 12.0081394 
##        427        428        429        430        431        432 
##  0.5294248  2.9981612  6.2771402  3.8144579  4.7265778  3.9194765 
##        433        434        435        436        437        438 
##  1.5758198  3.3597131  0.7300365  4.8460264  2.6775570  3.0620188 
##        439        440        441        442        443        444 
## 18.4550618  1.9455444  0.9015921  6.0867235  8.7356085  3.2073494 
##        445        446        447        448        449        450 
##  2.8644503  1.6007424  3.4033430  3.3534419  1.2021491  2.6849669 
##        451        452        453        454        455        456 
##  6.2310982  1.6219520  3.2333088  2.8709483  9.2108468  3.8889212 
##        457        458        459        460        461        462 
##  2.0393102  2.7539344  4.8124709  3.0780254  4.2241032  1.5269585 
##        463        464        465        466        467        468 
##  4.5079890  3.2887046  3.4838599  1.1906739  2.5913981  2.3642921 
##        469        470        471        472        473        474 
##  3.2695872  3.2306748  2.1891944  3.2086076  3.0483109  1.5583886 
##        475        476        477        478        479        480 
##  5.2464734  3.7100821  7.5521777  3.1943784  4.3814936  2.7165763 
##        481        482        483        484        485        486 
##  8.0780710  2.7421623  3.3712178  3.2271821  3.5823505  1.4687812 
##        487        488        489        490        491        492 
##  6.2189482  3.0844112 10.6234434  1.2798902  3.4332115  5.0857476 
##        493        494        495        496        497        498 
##  3.6490255  7.8236609  8.8947207  2.8881484  2.3282166  1.1741008 
##        499        500        501        502        503        504 
##  2.1287653  5.5581049  6.5335234  1.7493740  3.0416574  7.5859634 
##        505        506        507        508        509        510 
##  0.4045176  2.9343432  1.9073675  3.4644119  2.6358349  2.9197387 
##        511        512        513        514        515        516 
##  1.7329927  2.3985317  3.7737317  3.1279903  0.7428655  3.6361908 
##        517        518        519        520        521        522 
##  5.1030592  1.0610215  0.4904305  1.5054246  2.8861659  7.3578915 
##        523        524        525        526        527        528 
##  2.1512582  2.9012910  5.8450544  3.5491018  4.5320898  1.6783706 
##        529        530        531        532        533        534 
##  2.8714757  3.5957465 10.5053823  3.7423527  2.7798664  1.9281083 
##        535        536        537        538        539        540 
##  6.6402941  0.4715129  5.4119610  1.1051106  0.7334534  2.8282867 
##        541        542        543        544        545        546 
##  4.8916963  2.9572461  4.7099818 22.7256860  2.3164738  2.4480787 
##        547        548        549        550        551        552 
##  2.7143546  8.9339550  2.0454925 13.0757988  4.0514502  1.4170427 
##        553        554        555        556        557        558 
##  6.5081898  1.6208717  5.4957065  2.5472637  0.8368322  1.9186744 
##        559        560        561        562        563        564 
##  8.0345836  4.1954101  4.1584569  2.3452892  1.9227895  6.6501365 
##        565        566        567        568        569        570 
##  3.4161521  3.0530644  3.4653577  0.9440482 11.1549245  8.8063118 
##        571        572        573        574        575        576 
##  1.4779057  4.3740755  6.4608908  4.2841125  2.5770779  3.3461905 
##        577        578        579        580        581        582 
##  2.7177004 17.6011420  1.1921997  3.0204329  3.1141148  3.1052205 
##        583        584        585        586        587        588 
##  7.6433567  2.0322275  5.0571566  0.6119191  6.9438139  8.1402135 
##        589        590        591        592        593        594 
##  4.1310269  0.3887700  3.0711396  3.9600145  2.9250580  4.0895873 
##        595        596        597        598        599        600 
##  4.2066109  0.8329048  0.9587050  0.7658059  1.1814359  3.9882857 
##        601        602        603        604        605        606 
##  6.0682318  1.9460575  3.4819729  7.1450466  3.6794276  2.2500809 
##        607        608        609        610        611        612 
##  2.3077086  5.4892063  1.7573198  7.5499472  0.6824379  1.5783599 
##        613        614        615        616        617        618 
##  3.6316281  6.3318447  3.2632096  6.3704856  2.7134178  2.4599062 
##        619        620        621        622        623        624 
##  8.2697024  3.5567338  0.9166201 10.9303219  3.2067849  8.6803651 
##        625        626        627        628        629        630 
##  3.3014288  1.9478270  2.9617908  4.3243037  4.3980013  8.1606508 
##        631        632        633        634        635        636 
##  2.1296286  2.2633320  2.6477177  2.4532201  3.3991633  5.5464276 
##        637        638        639        640        641        642 
##  2.9044778  2.9713632  1.9240583  7.4936433 15.6247937  5.3730550 
##        643        644        645        646        647        648 
##  3.7609006  5.0878597  1.2066544  3.1228691  0.7808288  4.1048463 
##        649        650        651        652        653        654 
##  3.2320869  3.0161562  5.4390422  3.4642497  1.6112507  2.0400100 
##        655        656        657        658        659        660 
##  3.6319512  3.0353804  5.9067286  2.0742918  1.7122514  7.5145985 
##        661        662        663        664        665        666 
## 10.6541320  3.5322137  0.6477985  3.8669381  7.1212023  7.0359564 
##        667        668        669        670        671        672 
##  0.9565148  5.6572746  9.2888646  0.3713842  3.5064816  1.6497575 
##        673        674        675        676        677        678 
##  1.2772619  1.2393817 10.8391516  3.6034054  6.6062836  9.9873845 
##        679        680        681        682        683        684 
##  4.8080873  2.8349889  5.8907049  5.2500914  5.9304114  3.3076559 
##        685        686        687        688        689        690 
##  4.0972970  2.2834942  1.4039693  5.1007696  3.0008490  5.3859965 
##        691        692        693        694        695        696 
##  2.7670710  1.6544843  2.4201421  3.1240518  1.8689314  5.1602921 
##        697        698        699        700        701        702 
##  2.5294874  5.5237977  2.8483109  5.0036805  3.5774351  7.3615648 
##        703        704        705        706        707        708 
##  5.4851950  0.7847748  0.6374977  1.4099135  8.3035937  3.3219614 
##        709        710        711        712        713        714 
##  3.2175084  1.2949496  3.5994613  2.8249728  0.9358100  6.4198257 
##        715        716        717        718        719        720 
##  3.4839133  3.5222365  2.5556289  3.7389387  2.4781097  3.0915357 
##        721        722        723        724        725        726 
##  4.6660655  4.6098107  4.8987467  4.1963328  1.2166089 10.3608589 
##        727        728        729        730        731        732 
##  0.4735381  3.0670822  6.6556883  1.3803291  4.1659516  3.7410261 
##        733        734        735        736        737        738 
##  4.1223589  0.9730105  4.5421021  1.2782271  7.6485597  2.7947362 
##        739        740        741        742        743        744 
##  9.0814280  9.6705147  2.1657788  2.1652359  7.3757937  1.7214274 
##        745        746        747        748        749        750 
##  2.2788054  0.8860709  1.0522990  4.5709963  1.5254241  1.7981396 
##        751        752        753        754        755        756 
##  1.9009966  2.6895476  5.4604503  4.8575549  7.7491669  1.3945218 
##        757        758        759        760        761        762 
##  4.3202688  2.1422802  5.0171025 10.3150356  3.1254685  2.6472621 
##        763        764        765        766        767        768 
##  1.0483454  2.1579525  1.7630681  2.7681676  0.8062491  0.4407182 
##        769        770        771        772        773        774 
##  6.1989781  4.0343703  5.5511428  2.2210316  1.9015843  3.6609317 
##        775        776        777        778        779        780 
##  1.3992653  3.6383975  1.0676408  3.9129060  3.5901719  2.2598395 
##        781        782        783        784        785        786 
##  4.4965230  7.3693162  1.8825841  1.6828019  0.6926423  3.1214864 
##        787        788        789        790        791        792 
##  4.2963511  4.3345305  4.8809531  3.8270923  5.1154678  3.1426524 
##        793        794        795        796        797        798 
##  3.5148339  5.3485104  0.9780816  2.2479603  7.4137000  2.0007219 
##        799        800        801        802        803        804 
##  3.0841149  2.4022445  0.8480520  4.8437762  3.3323724 13.1317379 
##        805        806        807        808        809        810 
##  2.6166114  2.1897949  4.1633716  4.5416996  3.3054268  1.6779157 
##        811        812        813        814        815        816 
##  3.0070535  7.8633547  4.6203038  2.9684493  1.6224402  2.9990385 
##        817        818        819        820        821        822 
##  2.9196033  5.9930543  1.7195965 15.6130146  0.9050942  5.8216025 
##        823        824        825        826        827        828 
##  3.3448247  1.0118405  4.6494449  3.6973592  7.0196112  0.8806658 
##        829        830        831        832        833        834 
##  7.2766057  2.2457516  4.1412465  1.6403492  2.4717111  3.2942215 
##        835        836        837        838        839        840 
##  0.8664218  1.9423853  1.3593691  2.7530336  2.1428333  3.1208769 
##        841        842        843        844        845        846 
##  2.8293284  4.2776824  7.2096458  3.5209170  2.1501473  8.6996362 
##        847        848        849        850        851        852 
##  1.3785648 14.1543591  2.6397123  2.5673132  5.7174487  3.1120147 
##        853        854        855        856        857        858 
##  6.9867756  3.1126195  5.1008074  2.8574561  6.3745125  3.2829172 
##        859        860        861        862        863        864 
##  1.6428575  3.0421143 14.2262052  2.9941600  2.1786753  3.9892547 
##        865        866        867        868        869        870 
##  2.6969942  0.9738778  5.1744029  6.4848799  5.1422490  4.7178120 
##        871        872        873        874        875        876 
##  2.7564450  1.1975720  2.0561249  0.4857804  4.9381649  2.3657251 
##        877        878        879        880        881        882 
##  2.0087199 11.5775493  0.4791625  0.9630152  5.3243972  1.7357878 
##        883        884        885        886        887        888 
##  4.0547022  9.6441351  7.5802503  0.9198217  3.6084483  1.1858998 
##        889        890        891        892        893        894 
##  5.5367391  2.0751419  5.8457549  6.2929777  3.5625024  7.6908928 
##        895        896        897        898        899        900 
##  3.7923750  8.1748557  4.3958230  2.6909169  6.7231554  4.2480604 
##        901        902        903        904        905        906 
##  3.4664853 11.4022794  3.9564332  2.8941924  3.7950406  1.7665919 
##        907        908        909        910        911        912 
##  3.9324513  0.5947103  7.3412788  2.4960643  1.8138528  8.2690942 
##        913        914        915        916        917        918 
##  5.0635937  0.6881932  1.8986903  2.8262846  2.0279902  7.0792350 
##        919        920        921        922        923        924 
##  3.8071901  0.9903548  4.8700719  3.8866017  0.4593950  2.3441106 
##        925        926        927        928        929        930 
##  2.6742826  1.6888444  3.9243582  5.1884089  5.5402709  0.9151151 
##        931        932        933        934        935        936 
##  9.4803364  0.5401805 11.6096588  2.0858818  2.8298206  3.5037975 
##        937        938        939        940        941        942 
##  3.0728314 11.8378406  3.4649068  2.2439646  4.2323575  7.6655048 
##        943        944        945        946        947        948 
##  6.5966245  4.2376641  5.9599536  2.3703888  1.8877847  5.3063382 
##        949        950        951        952        953        954 
##  1.2882895  4.3447045  7.2327505 10.4747770  0.7767247  5.2275118 
##        955        956        957        958        959        960 
##  0.9471382  3.4799751  6.0477645  1.5878067  5.5738301  2.2173210 
##        961        962        963        964        965        966 
##  5.3169680  2.0043298  3.4834206  3.8963421  2.5970285  0.7069927 
##        967        968        969        970        971        972 
##  3.0275779  1.1556376  2.9416765 11.3422992  3.4727793  2.3271346 
##        973        974        975        976        977        978 
##  4.7776213  6.6590629  3.0354810  3.2848722  4.3245859  1.2485055 
##        979        980        981        982        983        984 
##  5.4515304  1.2243751  2.9213821  3.2529825  4.0151480  1.0942334 
##        985        986        987        988        989        990 
## 13.0131660  1.0330519  2.8962681  2.1558059  1.9654590  4.9265545 
##        991        992        993        994        995        996 
##  3.8090156  1.0172603  1.4428652  0.5037952  6.7316702  4.1304136 
##        997        998        999       1000       1001       1002 
##  3.3061139  4.8728557  4.1981371  1.4347947  3.0187455  6.8846794 
##       1003       1004       1005       1006       1007       1008 
##  2.6029982  1.6814095  4.1867479  1.6074959  1.5704669  3.6710129 
##       1009       1010       1011       1012       1013       1014 
##  4.0378332  1.2498824  3.2307554  4.2126184  9.8628543  0.8350815 
##       1015       1016       1017       1018       1019       1020 
##  1.3435185  4.0332636  2.8796318  2.4417252  4.3790299  4.5226471 
##       1021       1022       1023       1024       1025       1026 
##  3.1220941  9.1629565  5.9823272  3.4941752  6.8823715  3.4937097 
##       1027       1028       1029       1030       1031       1032 
##  3.1534376  7.6274339  2.3670760  5.4468490  2.3538406  8.0014287 
##       1033       1034       1035       1036       1037       1038 
##  1.5719218  5.4097456  5.0279306  5.4903944  8.6781751  5.0949617 
##       1039       1040       1041       1042       1043       1044 
##  2.5097667  4.5731550  1.7404517  4.2407856  7.6194296  2.1131145 
##       1045       1046       1047       1048       1049       1050 
##  2.0753113  2.3143308  1.9606274 21.7013877  2.6390879  6.0663177 
##       1051       1052       1053       1054       1055       1056 
##  1.6789624  4.9319064  0.8224400  3.4701713  3.6015232  2.7395121 
##       1057       1058       1059       1060       1061       1062 
##  1.6630390  1.1448931  5.1158026  1.1053097  2.4726375  2.1625683 
##       1063       1064       1065       1066       1067       1068 
## 10.3533758  3.0007718  7.2913129  1.0665421  2.5954257  5.6876980 
##       1069       1070       1071       1072       1073       1074 
##  5.6999542  1.4346093  6.0498578  4.0018299  2.8065548  1.9464936 
##       1075       1076       1077       1078       1079       1080 
##  7.9261998  0.6153327  0.7215941  2.9554906  6.0653301  5.4998144 
##       1081       1082       1083       1084       1085       1086 
##  4.9441791  0.7222225  3.1972639  0.7626983  3.2763127 15.2494793 
##       1087       1088       1089       1090       1091       1092 
##  2.5368811  2.7995023  9.3864960  4.7938840  6.8726407  2.4295394 
##       1093       1094       1095       1096       1097       1098 
##  4.6280433  7.2628922  6.8051092  8.7050131  7.0823582  3.1464037 
##       1099       1100       1101       1102       1103       1104 
##  2.1356772  1.9176253  4.8111155  3.7975898  3.4022857  3.7415791 
##       1105       1106       1107       1108       1109       1110 
##  1.4208736  2.3049186  1.6634758  1.9336130  1.2567743  5.8170400 
##       1111       1112       1113       1114       1115       1116 
##  1.4086530  9.1241256  4.2564739  3.8429115  2.9934949  1.6885478 
##       1117       1118       1119       1120       1121       1122 
## 10.8412093  6.7499739  5.7787686  6.0022027  7.0315530  2.8636856 
##       1123       1124       1125       1126       1127       1128 
##  9.7933973  1.4030254 11.7171034  5.0789107  2.5163402  2.0337770 
##       1129       1130       1131       1132       1133       1134 
##  0.3182248  6.1525506 11.9865913  9.3039313  4.6510295  6.2697059 
##       1135       1136       1137       1138       1139       1140 
##  2.3595064  3.4478499  1.2514678  3.3761753  1.4060371  9.2844780 
##       1141       1142       1143       1144       1145       1146 
##  2.1623191  3.0137954  4.2000212  1.1273236  1.5481337  3.6226009 
##       1147       1148       1149       1150       1151       1152 
## 12.2462641  3.0259316  4.0284905  1.7477159  3.2198493  3.7623660 
##       1153       1154       1155       1156       1157       1158 
##  7.4927352  1.4068587  6.5692554  4.6241724 14.8128542  3.5971199 
##       1159       1160       1161       1162       1163       1164 
##  2.8555662  5.0800027  0.9217549  5.4075756  2.6606661  3.2076354 
##       1165       1166       1167       1168       1169       1170 
##  0.4376919  1.5835294  5.3460415  2.1751646  2.1081319  0.8346884 
##       1171       1172       1173       1174       1175       1176 
##  3.2198992  1.7409729  6.0264997  0.9619918  1.6569104  2.6437842 
##       1177       1178       1179       1180       1181       1182 
##  2.9676384  0.5689064  3.1652403  1.7455666  3.8080649  2.1889866 
##       1183       1184       1185       1186       1187       1188 
##  2.1047351  0.9558746  2.1173773  2.3409601 10.7590860  3.8062428 
##       1189       1190       1191       1192       1193       1194 
##  1.6146831  2.3752405  1.5068337  2.2940844  2.0435104  2.6085444 
##       1195       1196       1197       1198       1199       1200 
##  3.1794015  5.6821790  7.9074132  1.6154916  0.3944365  1.7110829 
##       1201       1202       1203       1204       1205       1206 
##  1.8597172  3.9061054  2.7556762  1.0900528  5.9474908  4.4779911 
##       1207       1208       1209       1210       1211       1212 
##  4.9814958  5.5099377  1.3811939  3.2951604  0.4562942  1.4700768 
##       1213       1214       1215       1216       1217       1218 
##  4.8602572  1.8144277  1.1244588  5.5091280  1.9229146  3.1937819 
##       1219       1220       1221       1222       1223       1224 
##  5.6413533  2.8372935  2.3869993  1.7015485  2.5986928  6.4957836 
##       1225       1226       1227       1228       1229       1230 
##  1.4444833  3.4023678  5.8209679  2.3778752  3.0179623  2.9653632 
##       1231       1232       1233       1234       1235       1236 
## 16.7461313  6.0500161  4.8305580  4.6446783  1.2291810  2.0434996 
##       1237       1238       1239       1240       1241       1242 
##  6.6044697  3.1505111  4.3936664  5.8712465  9.8756674 10.2541792 
##       1243       1244       1245       1246       1247       1248 
##  6.2640604  2.1015182  3.7847527 12.5395170  3.5753791  6.5031190 
##       1249       1250       1251       1252       1253       1254 
##  6.4530359  5.4599826  2.6758805  5.5038890  3.4189804  5.8844952 
##       1255       1256       1257       1258       1259       1260 
##  1.4089868  2.9393924  4.2462482  1.6534447  5.3900449  3.5608777 
##       1261       1262       1263       1264       1265       1266 
##  3.7131142  2.6438795  0.5009727  0.4362710  1.4772129  6.7071544 
##       1267       1268       1269       1270       1271       1272 
##  2.4519302  6.8334493  3.3301632  3.2349317  1.7707508  2.7707180 
##       1273       1274       1275       1276       1277       1278 
## 11.3297752  0.6374115  2.4643922  4.2744224  2.4973364  1.3641525 
##       1279       1280       1281       1282       1283       1284 
##  0.7256340  2.1234848  1.8336781  1.9170480  5.4420653  2.5800570 
##       1285       1286       1287       1288       1289       1290 
##  8.6895474  2.5054038  5.8859059  0.8603203 10.3931552  0.9975262 
##       1291       1292       1293       1294       1295       1296 
##  3.7498684  8.6108747  3.7198034  3.6345672  3.9046951  3.6145974 
##       1297       1298       1299       1300       1301       1302 
##  3.4718712  1.8398409  1.2995829  2.6209920 19.3610536 10.5410779 
##       1303       1304       1305       1306       1307       1308 
##  3.4070134  6.0839503  2.2526852  2.2873526  3.7698789  6.9893457 
##       1309       1310       1311       1312       1313       1314 
##  6.5360983  1.1020209  0.8273351  1.5639002  5.4092323  8.6942485 
##       1315       1316       1317       1318       1319       1320 
##  4.8519179  2.4472377  5.0818309 19.5350028  8.2905599  1.2859469 
##       1321       1322       1323       1324       1325       1326 
##  3.3205584  5.2538825  5.4544134  8.4035325  1.0852102  3.7066954 
##       1327       1328       1329       1330       1331       1332 
##  1.3865740  1.0794747  4.5245291  3.4330925  3.1166859  2.4223183 
##       1333       1334       1335       1336       1337       1338 
##  8.9380683  3.3090506  3.3972148  4.9139610  3.0048990  4.6101159
# Assessing Outliers
outlierTest(trans.lm.fit, n.max = 20) # Bonferonni p-value for most extreme obs
##      rstudent unadjusted p-value Bonferonni p
## 517  5.413579         7.3233e-08   9.7986e-05
## 1301 5.107131         3.7474e-07   5.0140e-04
## 220  4.862537         1.2972e-06   1.7356e-03
## 1020 4.757778         2.1712e-06   2.9051e-03
## 431  4.723473         2.5646e-06   3.4314e-03
## 243  4.606212         4.4943e-06   6.0133e-03
## 527  4.518048         6.7959e-06   9.0929e-03
## 1207 4.485408         7.9058e-06   1.0578e-02
## 937  4.317248         1.6972e-05   2.2708e-02
## 1040 4.230762         2.4888e-05   3.3300e-02
## 103  4.179440         3.1134e-05   4.1657e-02
## 600  4.154718         3.4650e-05   4.6362e-02
qqPlot(trans.lm.fit, main="QQ Plot") #qq plot for studentized resid 

## [1]  517 1301
#leveragePlots(trans.lm.fit) # leverage plots
clean.insurance <- insurance %>%
  dplyr::slice(-c(517, 1301, 220, 1020, 431, 243, 527, 1207, 937, 1040, 103, 600))
## Warning: package 'bindrcpp' was built under R version 3.4.4
lm.fit2 <- update(trans.lm.fit, .~., data = clean.insurance) 
lm.fit2 %>%
  summary()
## 
## Call:
## lm(formula = charges^yTransformer ~ age + bmi + children + smoker + 
##     region + bmi:smoker, data = clean.insurance)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1043.00  -161.96  -106.82   -33.49  1853.69 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -67.3316    71.5479  -0.941   0.3468    
## age                28.1873     0.7964  35.393  < 2e-16 ***
## bmi                 2.1714     2.1351   1.017   0.3093    
## children           57.3645     9.1825   6.247 5.63e-10 ***
## smokeryes       -1365.8695   137.0731  -9.965  < 2e-16 ***
## regionnorthwest   -72.0395    31.8166  -2.264   0.0237 *  
## regionsoutheast  -130.3329    31.9558  -4.079 4.80e-05 ***
## regionsouthwest  -128.5052    31.9086  -4.027 5.96e-05 ***
## bmi:smokeryes     114.8525     4.3749  26.253  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 403 on 1317 degrees of freedom
## Multiple R-squared:  0.8642, Adjusted R-squared:  0.8633 
## F-statistic:  1047 on 8 and 1317 DF,  p-value: < 2.2e-16
  #residualPlot()
  #spreadLevelPlot()
  #plot()
  #outlierTest()

High Leverage points

vif(lm.fit2)
##                 GVIF Df GVIF^(1/(2*Df))
## age         1.016996  1        1.008462
## bmi         1.390277  1        1.179100
## children    1.003489  1        1.001743
## smoker     25.076390  1        5.007633
## region      1.100843  3        1.016142
## bmi:smoker 25.374504  1        5.037311
plot(lm.fit2)

ins.copy <- insurance
ins.copy$charges <- ins.copy$charges^yTransformer
clean.insurance$charges <- clean.insurance$charges^yTransformer
lm.final <- lm(charges~age+bmi+smoker+children+bmi*smoker, data = ins.copy)
confint(lm.final)
##                      2.5 %       97.5 %
## (Intercept)    -217.650756    82.694729
## age              25.909177    29.360742
## bmi              -3.665522     5.321269
## smokeryes     -1653.795339 -1056.007721
## children         37.804963    77.771027
## bmi:smokeryes   104.454360   123.539644

Sources : 1. https://www.kaggle.com/mirichoi0218/insurance/home 2. An Introduction to Statistical Learning and Reasoning 3. Wikipedia 4. https://www.statmethods.net/stats/rdiagnostics.html 5. https://www.statmethods.net/stats/regression.html 6. https://datascienceplus.com/how-to-detect-heteroscedasticity-and-rectify-it/